Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 1312 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 112.9 KiB |
| Average record size in memory | 88.1 B |
Variable types
| Numeric | 11 |
|---|
schooldist is highly correlated with council and 1 other fields | High correlation |
council is highly correlated with schooldist and 1 other fields | High correlation |
zipcode is highly correlated with schooldist and 1 other fields | High correlation |
lotarea is highly correlated with bldgarea and 1 other fields | High correlation |
bldgarea is highly correlated with lotarea and 1 other fields | High correlation |
unitstotal is highly correlated with lotarea and 1 other fields | High correlation |
block is highly correlated with council | High correlation |
schooldist is highly correlated with council and 1 other fields | High correlation |
council is highly correlated with block and 2 other fields | High correlation |
zipcode is highly correlated with schooldist and 1 other fields | High correlation |
landuse is highly correlated with unitstotal | High correlation |
lotarea is highly correlated with bldgarea | High correlation |
bldgarea is highly correlated with lotarea | High correlation |
unitstotal is highly correlated with landuse | High correlation |
block is highly correlated with council | High correlation |
schooldist is highly correlated with council and 1 other fields | High correlation |
council is highly correlated with block and 2 other fields | High correlation |
zipcode is highly correlated with schooldist and 1 other fields | High correlation |
landuse is highly correlated with unitstotal | High correlation |
lotarea is highly correlated with bldgarea | High correlation |
bldgarea is highly correlated with lotarea | High correlation |
unitstotal is highly correlated with landuse | High correlation |
landuse is highly correlated with yearbuilt and 1 other fields | High correlation |
zipcode is highly correlated with schooldist and 2 other fields | High correlation |
lotarea is highly correlated with unitstotal and 1 other fields | High correlation |
yearbuilt is highly correlated with landuse | High correlation |
schooldist is highly correlated with zipcode and 2 other fields | High correlation |
numfloors is highly correlated with bldgarea | High correlation |
council is highly correlated with zipcode and 3 other fields | High correlation |
block is highly correlated with zipcode and 2 other fields | High correlation |
unitstotal is highly correlated with lotarea and 1 other fields | High correlation |
bldgarea is highly correlated with lotarea and 2 other fields | High correlation |
df_index is highly correlated with landuse and 1 other fields | High correlation |
lotarea is highly skewed (γ1 = 26.28937583) | Skewed |
df_index has unique values | Unique |
Reproduction
| Analysis started | 2021-06-05 03:48:58.937824 |
|---|---|
| Analysis finished | 2021-06-05 03:49:15.189469 |
| Duration | 16.25 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 1312 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 415825.7248 |
| Minimum | 7 |
|---|---|
| Maximum | 858669 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 16462.3 |
| Q1 | 164294.5 |
| median | 270229 |
| Q3 | 681278 |
| 95-th percentile | 797188.8 |
| Maximum | 858669 |
| Range | 858662 |
| Interquartile range (IQR) | 516983.5 |
Descriptive statistics
| Standard deviation | 281477.3367 |
|---|---|
| Coefficient of variation (CV) | 0.6769117923 |
| Kurtosis | -1.64255652 |
| Mean | 415825.7248 |
| Median Absolute Deviation (MAD) | 261976.5 |
| Skewness | 0.08677248126 |
| Sum | 545563351 |
| Variance | 7.922949106 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 75777 | 1 | 0.1% |
| 642442 | 1 | 0.1% |
| 200442 | 1 | 0.1% |
| 241029 | 1 | 0.1% |
| 269519 | 1 | 0.1% |
| 241027 | 1 | 0.1% |
| 268006 | 1 | 0.1% |
| 603518 | 1 | 0.1% |
| 71037 | 1 | 0.1% |
| 603516 | 1 | 0.1% |
| Other values (1302) | 1302 |
| Value | Count | Frequency (%) |
| 7 | 1 | |
| 21 | 1 | |
| 32 | 1 | |
| 33 | 1 | |
| 394 | 1 | |
| 403 | 1 | |
| 424 | 1 | |
| 439 | 1 | |
| 491 | 1 | |
| 857 | 1 |
| Value | Count | Frequency (%) |
| 858669 | 1 | |
| 855445 | 1 | |
| 853648 | 1 | |
| 853646 | 1 | |
| 845489 | 1 | |
| 845434 | 1 | |
| 845270 | 1 | |
| 842007 | 1 | |
| 841008 | 1 | |
| 840979 | 1 |
| Distinct | 740 |
|---|---|
| Distinct (%) | 56.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1191.785061 |
| Minimum | 4 |
|---|---|
| Maximum | 15638 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 28 |
| Q1 | 778.75 |
| median | 1104.5 |
| Q3 | 1350.25 |
| 95-th percentile | 2427.05 |
| Maximum | 15638 |
| Range | 15634 |
| Interquartile range (IQR) | 571.5 |
Descriptive statistics
| Standard deviation | 1181.705809 |
|---|---|
| Coefficient of variation (CV) | 0.9915427267 |
| Kurtosis | 45.08573256 |
| Mean | 1191.785061 |
| Median Absolute Deviation (MAD) | 295 |
| Skewness | 5.255268283 |
| Sum | 1563622 |
| Variance | 1396428.619 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 16 | 20 | 1.5% |
| 1171 | 10 | 0.8% |
| 1118 | 9 | 0.7% |
| 21 | 8 | 0.6% |
| 1275 | 8 | 0.6% |
| 763 | 8 | 0.6% |
| 1269 | 7 | 0.5% |
| 1158 | 7 | 0.5% |
| 1374 | 6 | 0.5% |
| 760 | 6 | 0.5% |
| Other values (730) | 1223 |
| Value | Count | Frequency (%) |
| 4 | 1 | 0.1% |
| 5 | 1 | 0.1% |
| 6 | 4 | 0.3% |
| 9 | 2 | 0.2% |
| 10 | 1 | 0.1% |
| 11 | 2 | 0.2% |
| 13 | 1 | 0.1% |
| 15 | 2 | 0.2% |
| 16 | 20 | |
| 17 | 3 | 0.2% |
| Value | Count | Frequency (%) |
| 15638 | 1 | 0.1% |
| 15610 | 1 | 0.1% |
| 10101 | 1 | 0.1% |
| 9998 | 1 | 0.1% |
| 7459 | 1 | 0.1% |
| 7279 | 1 | 0.1% |
| 7274 | 5 | |
| 7273 | 2 | 0.2% |
| 7253 | 1 | 0.1% |
| 7250 | 1 | 0.1% |
| Distinct | 27 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.256097561 |
| Minimum | 1 |
|---|---|
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 2 |
| median | 2 |
| Q3 | 2 |
| 95-th percentile | 18.35 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 6.058320124 |
|---|---|
| Coefficient of variation (CV) | 1.423444843 |
| Kurtosis | 8.845706856 |
| Mean | 4.256097561 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.062523127 |
| Sum | 5584 |
| Variance | 36.70324273 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=27)
| Value | Count | Frequency (%) |
| 2 | 983 | |
| 3 | 100 | 7.6% |
| 13 | 39 | 3.0% |
| 30 | 31 | 2.4% |
| 14 | 21 | 1.6% |
| 1 | 19 | 1.4% |
| 5 | 17 | 1.3% |
| 21 | 13 | 1.0% |
| 15 | 12 | 0.9% |
| 7 | 10 | 0.8% |
| Other values (17) | 67 | 5.1% |
| Value | Count | Frequency (%) |
| 1 | 19 | 1.4% |
| 2 | 983 | |
| 3 | 100 | 7.6% |
| 4 | 9 | 0.7% |
| 5 | 17 | 1.3% |
| 6 | 10 | 0.8% |
| 7 | 10 | 0.8% |
| 8 | 2 | 0.2% |
| 9 | 5 | 0.4% |
| 10 | 7 | 0.5% |
| Value | Count | Frequency (%) |
| 31 | 1 | 0.1% |
| 30 | 31 | |
| 28 | 10 | 0.8% |
| 27 | 2 | 0.2% |
| 25 | 3 | 0.2% |
| 24 | 1 | 0.1% |
| 23 | 3 | 0.2% |
| 22 | 1 | 0.1% |
| 21 | 13 | |
| 20 | 1 | 0.1% |
| Distinct | 37 |
|---|---|
| Distinct (%) | 2.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.041158537 |
| Minimum | 1 |
|---|---|
| Maximum | 50 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 4 |
| Q3 | 5 |
| 95-th percentile | 33 |
| Maximum | 50 |
| Range | 49 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 9.526060343 |
|---|---|
| Coefficient of variation (CV) | 1.35291093 |
| Kurtosis | 5.489807161 |
| Mean | 7.041158537 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 2.539203771 |
| Sum | 9238 |
| Variance | 90.74582566 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=37)
| Value | Count | Frequency (%) |
| 4 | 438 | |
| 3 | 212 | |
| 1 | 178 | |
| 5 | 110 | 8.4% |
| 6 | 86 | 6.6% |
| 2 | 72 | 5.5% |
| 33 | 54 | 4.1% |
| 26 | 31 | 2.4% |
| 35 | 17 | 1.3% |
| 7 | 14 | 1.1% |
| Other values (27) | 100 | 7.6% |
| Value | Count | Frequency (%) |
| 1 | 178 | |
| 2 | 72 | 5.5% |
| 3 | 212 | |
| 4 | 438 | |
| 5 | 110 | 8.4% |
| 6 | 86 | 6.6% |
| 7 | 14 | 1.1% |
| 8 | 11 | 0.8% |
| 9 | 12 | 0.9% |
| 10 | 9 | 0.7% |
| Value | Count | Frequency (%) |
| 50 | 1 | 0.1% |
| 48 | 9 | |
| 47 | 4 | |
| 43 | 1 | 0.1% |
| 42 | 1 | 0.1% |
| 41 | 1 | 0.1% |
| 40 | 1 | 0.1% |
| 38 | 1 | 0.1% |
| 37 | 1 | 0.1% |
| 36 | 1 | 0.1% |
| Distinct | 97 |
|---|---|
| Distinct (%) | 7.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10168.26982 |
| Minimum | 10001 |
|---|---|
| Maximum | 11691 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 10001 |
|---|---|
| 5-th percentile | 10001 |
| Q1 | 10016 |
| median | 10022 |
| Q3 | 10038 |
| 95-th percentile | 11212 |
| Maximum | 11691 |
| Range | 1690 |
| Interquartile range (IQR) | 22 |
Descriptive statistics
| Standard deviation | 372.666339 |
|---|---|
| Coefficient of variation (CV) | 0.03664992626 |
| Kurtosis | 4.111094274 |
| Mean | 10168.26982 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 2.405819607 |
| Sum | 13340770 |
| Variance | 138880.2002 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10022 | 112 | 8.5% |
| 10017 | 96 | 7.3% |
| 10019 | 90 | 6.9% |
| 10016 | 83 | 6.3% |
| 10036 | 71 | 5.4% |
| 10018 | 70 | 5.3% |
| 10001 | 68 | 5.2% |
| 10023 | 61 | 4.6% |
| 10128 | 40 | 3.0% |
| 11201 | 37 | 2.8% |
| Other values (87) | 584 |
| Value | Count | Frequency (%) |
| 10001 | 68 | |
| 10002 | 14 | 1.1% |
| 10003 | 15 | 1.1% |
| 10004 | 23 | 1.8% |
| 10005 | 28 | |
| 10006 | 20 | 1.5% |
| 10007 | 26 | 2.0% |
| 10009 | 2 | 0.2% |
| 10010 | 29 | |
| 10011 | 16 | 1.2% |
| Value | Count | Frequency (%) |
| 11691 | 2 | 0.2% |
| 11435 | 1 | 0.1% |
| 11433 | 1 | 0.1% |
| 11415 | 1 | 0.1% |
| 11379 | 1 | 0.1% |
| 11375 | 5 | |
| 11374 | 2 | 0.2% |
| 11365 | 1 | 0.1% |
| 11355 | 2 | 0.2% |
| 11249 | 10 |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.22027439 |
| Minimum | 1 |
|---|---|
| Maximum | 8 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 4 |
| median | 4 |
| Q3 | 5 |
| 95-th percentile | 5 |
| Maximum | 8 |
| Range | 7 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.9322351204 |
|---|---|
| Coefficient of variation (CV) | 0.2208944334 |
| Kurtosis | 2.976752232 |
| Mean | 4.22027439 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.8845470006 |
| Sum | 5537 |
| Variance | 0.8690623198 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 5 | 499 | |
| 4 | 483 | |
| 3 | 303 | |
| 8 | 24 | 1.8% |
| 6 | 1 | 0.1% |
| 2 | 1 | 0.1% |
| 1 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 1 | 1 | 0.1% |
| 2 | 1 | 0.1% |
| 3 | 303 | |
| 4 | 483 | |
| 5 | 499 | |
| 6 | 1 | 0.1% |
| 8 | 24 | 1.8% |
| Value | Count | Frequency (%) |
| 8 | 24 | 1.8% |
| 6 | 1 | 0.1% |
| 5 | 499 | |
| 4 | 483 | |
| 3 | 303 | |
| 2 | 1 | 0.1% |
| 1 | 1 | 0.1% |
| Distinct | 1198 |
|---|---|
| Distinct (%) | 91.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 44862.3407 |
| Minimum | 1506 |
|---|---|
| Maximum | 5048550 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 1506 |
|---|---|
| 5-th percentile | 4938 |
| Q1 | 11146.75 |
| median | 21579 |
| Q3 | 41603.5 |
| 95-th percentile | 139416.3 |
| Maximum | 5048550 |
| Range | 5047044 |
| Interquartile range (IQR) | 30456.75 |
Descriptive statistics
| Standard deviation | 154873.9803 |
|---|---|
| Coefficient of variation (CV) | 3.452204629 |
| Kurtosis | 833.9838104 |
| Mean | 44862.3407 |
| Median Absolute Deviation (MAD) | 12501.5 |
| Skewness | 26.28937583 |
| Sum | 58859391 |
| Variance | 2.398594976 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7406 | 8 | 0.6% |
| 9875 | 7 | 0.5% |
| 10042 | 6 | 0.5% |
| 6025 | 6 | 0.5% |
| 7500 | 5 | 0.4% |
| 6024 | 4 | 0.3% |
| 5021 | 4 | 0.3% |
| 7531 | 4 | 0.3% |
| 24100 | 4 | 0.3% |
| 12552 | 4 | 0.3% |
| Other values (1188) | 1260 |
| Value | Count | Frequency (%) |
| 1506 | 2 | |
| 1942 | 1 | 0.1% |
| 2025 | 1 | 0.1% |
| 2143 | 1 | 0.1% |
| 2150 | 1 | 0.1% |
| 2209 | 2 | |
| 2468 | 1 | 0.1% |
| 2469 | 1 | 0.1% |
| 2475 | 1 | 0.1% |
| 2510 | 3 |
| Value | Count | Frequency (%) |
| 5048550 | 1 | |
| 856800 | 1 | |
| 833945 | 1 | |
| 746956 | 1 | |
| 659375 | 1 | |
| 622700 | 1 | |
| 539730 | 1 | |
| 519220 | 1 | |
| 393100 | 1 | |
| 375650 | 1 |
| Distinct | 1295 |
|---|---|
| Distinct (%) | 98.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 470078.9345 |
| Minimum | 1344 |
|---|---|
| Maximum | 13540113 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 1344 |
|---|---|
| 5-th percentile | 75955.9 |
| Q1 | 185485.25 |
| median | 325086.5 |
| Q3 | 547810 |
| 95-th percentile | 1354649.5 |
| Maximum | 13540113 |
| Range | 13538769 |
| Interquartile range (IQR) | 362324.75 |
Descriptive statistics
| Standard deviation | 612529.6512 |
|---|---|
| Coefficient of variation (CV) | 1.303035738 |
| Kurtosis | 185.4988725 |
| Mean | 470078.9345 |
| Median Absolute Deviation (MAD) | 164582 |
| Skewness | 10.25882396 |
| Sum | 616743562 |
| Variance | 3.751925736 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 332608 | 3 | 0.2% |
| 623806 | 3 | 0.2% |
| 74291 | 2 | 0.2% |
| 370990 | 2 | 0.2% |
| 50648 | 2 | 0.2% |
| 177000 | 2 | 0.2% |
| 431000 | 2 | 0.2% |
| 470000 | 2 | 0.2% |
| 216247 | 2 | 0.2% |
| 96420 | 2 | 0.2% |
| Other values (1285) | 1290 |
| Value | Count | Frequency (%) |
| 1344 | 1 | |
| 3146 | 1 | |
| 3280 | 1 | |
| 24212 | 1 | |
| 28884 | 1 | |
| 35219 | 1 | |
| 35670 | 1 | |
| 38353 | 2 | |
| 39291 | 1 | |
| 39964 | 1 |
| Value | Count | Frequency (%) |
| 13540113 | 1 | |
| 8837500 | 1 | |
| 3693539 | 1 | |
| 3221237 | 1 | |
| 2907315 | 1 | |
| 2812739 | 1 | |
| 2734038 | 1 | |
| 2689635 | 1 | |
| 2636182 | 1 | |
| 2531670 | 1 |
| Distinct | 65 |
|---|---|
| Distinct (%) | 5.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 32.55678354 |
| Minimum | 20.5 |
|---|---|
| Maximum | 104 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 20.5 |
|---|---|
| 5-th percentile | 21 |
| Q1 | 24 |
| median | 30 |
| Q3 | 38 |
| 95-th percentile | 55.45 |
| Maximum | 104 |
| Range | 83.5 |
| Interquartile range (IQR) | 14 |
Descriptive statistics
| Standard deviation | 11.67360651 |
|---|---|
| Coefficient of variation (CV) | 0.3585614192 |
| Kurtosis | 4.113977272 |
| Mean | 32.55678354 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 1.694656403 |
| Sum | 42714.5 |
| Variance | 136.273089 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 21 | 152 | 11.6% |
| 22 | 90 | 6.9% |
| 23 | 74 | 5.6% |
| 24 | 69 | 5.3% |
| 26 | 69 | 5.3% |
| 25 | 63 | 4.8% |
| 30 | 55 | 4.2% |
| 32 | 53 | 4.0% |
| 27 | 48 | 3.7% |
| 31 | 48 | 3.7% |
| Other values (55) | 591 |
| Value | Count | Frequency (%) |
| 20.5 | 2 | 0.2% |
| 21 | 152 | |
| 22 | 90 | |
| 22.5 | 1 | 0.1% |
| 23 | 74 | |
| 23.5 | 1 | 0.1% |
| 24 | 69 | |
| 25 | 63 | |
| 26 | 69 | |
| 27 | 48 | 3.7% |
| Value | Count | Frequency (%) |
| 104 | 1 | 0.1% |
| 102 | 1 | 0.1% |
| 90 | 1 | 0.1% |
| 88 | 2 | 0.2% |
| 82 | 1 | 0.1% |
| 78 | 1 | 0.1% |
| 77 | 1 | 0.1% |
| 76 | 1 | 0.1% |
| 73 | 6 | |
| 72 | 1 | 0.1% |
| Distinct | 499 |
|---|---|
| Distinct (%) | 38.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 221.2073171 |
| Minimum | 0 |
|---|---|
| Maximum | 10948 |
| Zeros | 6 |
| Zeros (%) | 0.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 36 |
| median | 142 |
| Q3 | 306 |
| 95-th percentile | 705.9 |
| Maximum | 10948 |
| Range | 10948 |
| Interquartile range (IQR) | 270 |
Descriptive statistics
| Standard deviation | 385.4302059 |
|---|---|
| Coefficient of variation (CV) | 1.742393566 |
| Kurtosis | 458.2795667 |
| Mean | 221.2073171 |
| Median Absolute Deviation (MAD) | 119 |
| Skewness | 17.01352631 |
| Sum | 290224 |
| Variance | 148556.4436 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 107 | 8.2% |
| 2 | 48 | 3.7% |
| 3 | 20 | 1.5% |
| 184 | 12 | 0.9% |
| 40 | 10 | 0.8% |
| 5 | 9 | 0.7% |
| 4 | 9 | 0.7% |
| 35 | 9 | 0.7% |
| 79 | 8 | 0.6% |
| 7 | 8 | 0.6% |
| Other values (489) | 1072 |
| Value | Count | Frequency (%) |
| 0 | 6 | 0.5% |
| 1 | 107 | |
| 2 | 48 | |
| 3 | 20 | 1.5% |
| 4 | 9 | 0.7% |
| 5 | 9 | 0.7% |
| 6 | 5 | 0.4% |
| 7 | 8 | 0.6% |
| 8 | 5 | 0.4% |
| 9 | 2 | 0.2% |
| Value | Count | Frequency (%) |
| 10948 | 1 | |
| 1706 | 1 | |
| 1660 | 1 | |
| 1615 | 1 | |
| 1604 | 1 | |
| 1547 | 1 | |
| 1521 | 1 | |
| 1349 | 1 | |
| 1332 | 1 | |
| 1321 | 1 |
| Distinct | 116 |
|---|---|
| Distinct (%) | 8.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1973.112043 |
| Minimum | 0 |
|---|---|
| Maximum | 2020 |
| Zeros | 3 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1922 |
| Q1 | 1962 |
| median | 1981 |
| Q3 | 2006 |
| 95-th percentile | 2018 |
| Maximum | 2020 |
| Range | 2020 |
| Interquartile range (IQR) | 44 |
Descriptive statistics
| Standard deviation | 99.53538339 |
|---|---|
| Coefficient of variation (CV) | 0.0504458851 |
| Kurtosis | 351.9940468 |
| Mean | 1973.112043 |
| Median Absolute Deviation (MAD) | 23 |
| Skewness | -17.85574525 |
| Sum | 2588723 |
| Variance | 9907.292547 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2015 | 42 | 3.2% |
| 1963 | 38 | 2.9% |
| 1964 | 37 | 2.8% |
| 1987 | 35 | 2.7% |
| 2006 | 32 | 2.4% |
| 1929 | 30 | 2.3% |
| 2018 | 30 | 2.3% |
| 1986 | 29 | 2.2% |
| 2007 | 28 | 2.1% |
| 2019 | 28 | 2.1% |
| Other values (106) | 983 |
| Value | Count | Frequency (%) |
| 0 | 3 | |
| 1883 | 1 | 0.1% |
| 1895 | 1 | 0.1% |
| 1896 | 1 | 0.1% |
| 1899 | 1 | 0.1% |
| 1900 | 3 | |
| 1901 | 1 | 0.1% |
| 1902 | 1 | 0.1% |
| 1903 | 1 | 0.1% |
| 1904 | 2 |
| Value | Count | Frequency (%) |
| 2020 | 24 | |
| 2019 | 28 | |
| 2018 | 30 | |
| 2017 | 24 | |
| 2016 | 24 | |
| 2015 | 42 | |
| 2014 | 22 | |
| 2013 | 22 | |
| 2012 | 22 | |
| 2011 | 8 | 0.6% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | block | schooldist | council | zipcode | landuse | lotarea | bldgarea | numfloors | unitstotal | yearbuilt | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 201606 | 1307 | 2.0 | 4.0 | 10022.0 | 5.0 | 11050.0 | 268106.0 | 35.0 | 45.0 | 1983.0 |
| 1 | 156742 | 1308 | 2.0 | 4.0 | 10022.0 | 5.0 | 81325.0 | 1526121.0 | 39.0 | 2.0 | 1969.0 |
| 2 | 282941 | 3251 | 10.0 | 11.0 | 10468.0 | 3.0 | 89622.0 | 381213.0 | 21.0 | 352.0 | 1967.0 |
| 3 | 243873 | 735 | 2.0 | 3.0 | 10018.0 | 3.0 | 34167.0 | 440709.0 | 23.0 | 399.0 | 2008.0 |
| 4 | 241399 | 1576 | 2.0 | 5.0 | 10075.0 | 3.0 | 18900.0 | 232400.0 | 30.0 | 163.0 | 1981.0 |
| 5 | 128222 | 1505 | 2.0 | 4.0 | 10128.0 | 4.0 | 22102.0 | 302439.0 | 32.0 | 212.0 | 1984.0 |
| 6 | 241148 | 1142 | 3.0 | 6.0 | 10023.0 | 4.0 | 12778.0 | 149314.0 | 21.0 | 125.0 | 1989.0 |
| 7 | 746216 | 15638 | 27.0 | 31.0 | 11691.0 | 3.0 | 263791.0 | 837935.0 | 26.0 | 606.0 | 1971.0 |
| 8 | 603418 | 1011 | 2.0 | 4.0 | 10019.0 | 3.0 | 14250.0 | 307549.0 | 36.0 | 198.0 | 1940.0 |
| 9 | 200463 | 997 | 2.0 | 4.0 | 10036.0 | 5.0 | 15565.0 | 426056.0 | 40.0 | 66.0 | 1988.0 |
Last rows
| df_index | block | schooldist | council | zipcode | landuse | lotarea | bldgarea | numfloors | unitstotal | yearbuilt | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1302 | 642313 | 1485 | 2.0 | 5.0 | 10021.0 | 8.0 | 39547.0 | 757439.0 | 24.0 | 1.0 | 2015.0 |
| 1303 | 200583 | 1024 | 2.0 | 3.0 | 10019.0 | 5.0 | 23900.0 | 762619.0 | 35.0 | 1.0 | 1987.0 |
| 1304 | 153741 | 1314 | 2.0 | 4.0 | 10016.0 | 8.0 | 19701.0 | 279254.0 | 25.0 | 24.0 | 2001.0 |
| 1305 | 608074 | 2623 | 7.0 | 17.0 | 10455.0 | 3.0 | 166139.0 | 422400.0 | 22.0 | 471.0 | 1960.0 |
| 1306 | 746585 | 967 | 2.0 | 4.0 | 10016.0 | 3.0 | 45190.0 | 922828.0 | 47.0 | 764.0 | 2014.0 |
| 1307 | 268057 | 840 | 2.0 | 4.0 | 10018.0 | 5.0 | 4148.0 | 88551.0 | 34.0 | 173.0 | 2018.0 |
| 1308 | 274079 | 2170 | 6.0 | 10.0 | 10040.0 | 3.0 | 96675.0 | 223200.0 | 21.0 | 205.0 | 1959.0 |
| 1309 | 680761 | 861 | 2.0 | 4.0 | 10016.0 | 4.0 | 8400.0 | 175687.0 | 35.0 | 166.0 | 2008.0 |
| 1310 | 200392 | 811 | 2.0 | 3.0 | 10018.0 | 5.0 | 19750.0 | 408511.0 | 22.0 | 88.0 | 1925.0 |
| 1311 | 227378 | 1037 | 2.0 | 3.0 | 10036.0 | 5.0 | 3292.0 | 75902.0 | 29.0 | 1.0 | 2014.0 |